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education are discussed, and how these reforms are likely to affect U.S. 
schools is explored. Data collected annually by the Council of Chief State 
School Officers (CCSSO) indicate that almost all states have some form of 
assessment that is administered to all students at one or more grade levels 
across the state. In addition, in 1995-96, almost all states had developed, 
or were in the process of developing, content standards defining what 
students should know and be able to do. Widespread belief that schools are 
not helping all students achieve at the levels they are capable of reaching 
has spurred reform efforts. Student assessment is at the top of the list of 
things to reform, since it is considered a way to set more appropriate 
targets for students and to focus staff development and curriculum reform. 

New content standards may require new assessment methods, whether 
short- answer , open-ended, extended- response , or other innovative forms, 
including performance based assessment. Some of the most severe challenges 
states face in implementing innovative assessment are due to the practical 
aspects of large-scale, statewide testing programs. Technical challenges such 
as scaling, reporting, generalizability, and sampling issues must also be 
considered. The CCSSO has undertaken activities to develop new types of 
assessments and to engage in research about performance assessment. 
Particularly promising are approaches that coordinate assessment at the 
state, district, and classroom levels. The CCSSO is a leader in studying this 
type of coordination. (Contains 14 references.) (SLD) 
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Co Introduction 

(N 

2 Currently, much discussion is taking place about the quality of American 

^ schools, the skills needed by students, and the ways we should be assessing 

Q these achievements. Student assessment is viewed nationally as the pivotal 
w piece around which school reform and improvement in the nation's schools 

turns. For example, student assessment is the key piece of Goals 2000, as well 
as other federal legislation such as the Improving America's Schools Act 
(IASA). 

The result is that substantially more assessment is likely to occur in our nation's 
schools, and to take place in areas traditionally not assessed (such as the arts), 
using assessment strategies (such as performance assessments and portfolios) 
not typically used. States and local districts are reconsidering the models for 
systems of assessment and how assessment at the state and local levels can 
be coordinated to achieve the reforms desired in education. This digest lays 
out some of the reasons for the reform of assessment and how these reforms 
will likely affect our schools. 

From data collected annually by the Council of Chief State School Officers 
(CCSSO) and the North Central Regional Educational Laboratory (NCREL), 
almost all states have some form of assessment that is administered to all 
students at one or more garde levels across the state (CCSSO/NCREL, 1996). 

In data reflecting the programs operated in 1995-96, virtually every state had 
developed or were developing content standards defining the knowledge 
and skills that students should know and be able to do. Many states were also 
in the process of revising their assessment programs to reflect these 
standards. 



Why Is School Reform Occurring? 

Widespread belief that schools are not helping all students achieve at the 
levels that they are capable of, nor that is needed, has spurred efforts to 
reform our schools. Concerns have been raised that the ways we teach 
students, as well as assess them, do not lead students to acquire needed 
knowledge or skills, nor help them apply and use their knowledge and skills 
appropriately. At the national and state levels, content standards containing 
the types of knowledge, skills, and behaviors now believed needed for all 
students to achieve at high levels are being developed. Starting with such 
efforts as the National Council of Teachers of Mathematics' Curriculum and 
Evaluation Standards for School Mathematics (NCTM, 1989), content 
standards are being developed in the arts, civics, economics, English, foreign 
languages, geography, health education, history, physical education, science, 
and social studies. 
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School reform is also motivated by the belief that there are competencies 
needed for graduates to enter the workforce successfully, The Secretary's 
Commission on Achieving Necessary Skills developed a list of generic 
competencies and foundation skills that all workers will need in the future (U. S, 
Department of Labor, 1991). Such skills include flexible problem solving, 
respecting the desires of the customer, working well on teams, taking 
responsibility for one's own performance, and continuous learning. These skills 
have been developed to guide the efforts of educational reform in the 
direction helping more students to make the transition to work successfully. 

Collectively, these standards represent substantial challenges for the 
American schools, They imply that all students will need to achieve at much 
higher levels. New strategies for assessment are also implied by these content 
standards. 

How Reform of Assessment Fits School Reform? 

Student assessment is at the top of the list of things to tinker with by 
policymakers at the national and state levels, since it is viewed as a means to 
set more appropriate targets for students, focus staff development efforts for 
the nation's teachers, encourage curriculum reform and improve instruction 
and instructional materials in a variety of subject matters and disciplines 
(Darling-Hammond & Wise, 1 985). Assessment is an important part of the 
equation because it is widely believed that what gets assessed is what gets 
taught, and that the format of assessment influences the format of learning 
and teaching (O'Day & Smith, 1 993). The hope of policymakers is that the 
changes in assessment will not only bring about the needed changes in 
students, but also in ways schools are organized (Linn, 1987; Madaus, 1985). 
Interest in performance assessment has also been justified on the basis that 
using such measures will accomplish (or at least promote) educational equity 
(National Center on Education and the Economy, 1989). Student assessment 
carries a heavy load these days! 

Of course, outside pressure from external testing programs can be ignored or 
resisted by local educators (Smith and Cohen, 1991). There is also ample 
evidence of the distortions in teaching that external testing programs can 
create (Shepard & Smith, 1 988). Rather than encourage reform of teaching, 
inappropriate teaching to the test may occur (as opposed to teaching to the 
broader domain covered by the test). Rather than creating opportunities for 
all students to learn to high levels, even new forms of assessment may lead to 
tracking and limiting opportunities for some students (Darling-Hammond, 1994; 
Oakes, 1985). 

Assessment reform should occur along with professional development, 
instructional development, and other strategies designed to assure that all of 
the changes are mutually supported. Coordination of assessment reform at 
the national and state levels with assessments at the local level is also 
important, so that each will present a coherent view of student performance, 
not simply be "stuck" together. 

Types of Assessments 




An essential element of the design of assessment is the choice of exercise 
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type(s). New content standards may require different assessment methods. 
Among the assessment techniques now being considered are short-answer, 
open-ended; extended-response, open-ended; individual interview; 
individually- or group-administered performance events; individual or group 
performance tasks in which students have extended time; projects; portfolios; 
observations of students; and anecdotal records, in addition to multiple- 
choice exercises. A broader repertoire of techniques is increasingly being 
used, 

Useful Assessment Designs 

Typically, student achievement is measured with available student test data, 
often using information from district or state testing programs. Information 
collected less formally at the classroom levels is not typically included in school 
improvement plans, even though such information could provide valuable 
insights into student learning. 

The nature of information needs should form the basis for an assessment 
design. In a top-down model, policymakers develop an assessment design 
that meets their needs, hoping the data may be useful by persons at lower 
levels. An alternative is to build the assessment system needed at the local 
level, aggregating the information upwards to the district, state and national 
levels. 

Another model, based on the assumption that multiple approaches will allow 
different users' needs to be met, is to develop comprehensive assessment 
system using different assessment formats to meet different users' needs. 
Various assessment strategies can be implemented together at the different 
levels to provide for the different information needs in a coordinated, 
coherent manner (Darling-Hammond, 1994). 

For example, local districts can adopt a portfolio system for improving 
instruction, while the state carries out matrix-sampling across important 
standards. The information collected by the state can become part of the 
student's portfolio, thereby strengthening the quality of the information 
contained in students' portfolios. The state could also provide opportunities for 
teachers to learn to score the open-ended written and performance 
assessments, thereby enhancing teachers' capabilities of observing and rating 
student performances in their classrooms. 

In this case, the elements of the assessment system at the different levels build 
on and support the elements at the other levels. It is also anticipated that 
information collected at the different levels can be reported in a more 
understandable manner, since the same standards apply in different ways. 

This assessment model enhances the reforms of schools so many desire. 

Practical Challenges Inherent in Using New Forms of Assessment 

Some of severest challenges that states face in implementing innovative 
approaches to assessment are due to the practical aspects of statewide 
testing programs. For example, programs that desire to use more 
constructed-response assessment exercises find that such types of exercises 




are more time-consuming, so that either they require more testing time overall 
(perhaps more than schools will permit to be devoted to such external testing 
programs) or the technical quality of the data may be compromised by using 
too few items for the types of reports of assessment results to be provided. 

This is true particularly for programs in which detailed individual student results 
or sub-scores of the various standards that comprise the assessment will be 
reported. However, the value of the external assessment program may be 
judged on the utility of the data that is returned to parents and teachers; the 
more specific the information (by student and/or sub-skill), the better in the 
view of teachers and parents. 

The testing time limits (which do vary from one state to another) can impact 
the quality of the assessment in another way: the breadth of the assessment. 
Since innovative assessment exercises take more time per exercise for 
students to complete, if testing time is not expanded, the number of exercises 
used may be cut sharply with the result that fewer aspects of the state's 
content standards are assessed (even if the assessments are more authentic). 
While "teaching to the test" when the assessment is more authentic may result 
in more authentic "practice," the impact of reduced "coverage" of the 
standards could be a narrower curricular focus to the assessment (and 
instruction prompted by the assessment) than might have been encouraged 
by an assessment that was primarily comprised of multiple-choice items. 

In a similar fashion, the time that it takes to return results to schools (and 
whether the data is returned in a timely manner so as to be useful in either 
remediation or instructional improvement) is also a practical impediment of 
innovative approaches to assessment. Constructed response exercises take 
time to score. Tests comprised of multiple-choice items can be scored and 
reported, even for large-scale programs, in a matter of days. However, it is 
more typical for large-scale constructed response assessments to take up to 
several months to return results for individual students. This time lag between 
assessment and reporting is so large that local educators may view the results 
(and the overall assessment program) as relatively useless, since the results 
come back to them so far after testing that the results can not be relied on; 
the results of spring testing may not be returned until the following school year, 
when some of the students have moved, and the remaining students are 
dispersed to various schools and classrooms around the district. One way to 
reduce this turn-around time is to have classroom teachers score the tests, but 
this is not a popular activity for local educators. 

Technical Challenges Inherent in Using New Forms of Assessment 

There are also a number of technical challenges inherent in new forms of 
assessment, some imposed by the practical constraints covered above, and 
others inherent in these forms of assessment. These need to be addressed as 
well. Some of these challenges are as follows: 

• Scaling Issues There are a couple aspects of the scaling issue that must 
be attended to when performance assessments are used. Due to 
practical constraints, the numbers of constructed response items used to 
assess a particular subject area may be few in number, and they may be 
purposely selected to scatter across a set of standards, with only one 
exercise use to represent any area of the assessment. Hence, the items 
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may be intentionally selected to replicate one another, which is then a 
challenge in scaling the items. 

When a "mixed" assessment model is used, an additional challenge of 
scaling multiple-choice and constructed response items together can be 
introduced. 

Finally, there can be the issue of whether special needs students' (English- 
language learners and students with disabilities) performances can be 
placed on the same scale as other students. Did the accommodations 
change the nature of the assessments given to these students? Were 
special needs students who did not receive accommodations assessed in 
a appropriate manner? 

All of these are scaling issues that must be considered when determining 
the manner in which items can be aggregated. 

• Reporting Issues Just as there are additional scaling issues in the use of 
performance assessments, there are also reporting issues of a similar 
nature. This really represents an inherent conflict: performance 
assessment exercises are used because they may well contribute a unique 
understanding of students, yet the goal is to report them together with the 
multiple-choice items used on one overall report of student performance. 

If the performance assessments fail to contribute substantial unique 
variance in overall student performance, are they worth the considerable 
investment of classroom time and money to use them? Yet, if they are too 
unique, can they be reported on the same scale as the multiple-choice 
items? An interesting conundruml 

• Generalizability Since few performance assessment exercises are used in 
a typical assessment program, can the few exercises selected truly 
represent curricular domains that sometimes are quite broad? Past 
research evidence indicates that it is difficult for the performance on one 
or a few exercises to generalize to other, supposedly comparable 
assessment exercises. How stable are the estimates of student 
performance? 

• Reporting Trend Data Another major reporting need is to report student 
performance over time. As the assessment program is used over time, 
there is a natural desire at both the state and local levels to examine 
whether achievement is improving. This means that assessments in which 
performance assessments are used and where the generalizability of 
these assessments is lower than for the multiple-choice sections of the 
assessments, the challenge of longitudinal reporting will need to be met. Is 
the form of assessment used each year sufficiently equivalent that 
observed differences between student performance are not due to the 
instruments used? 

• Use of Matrix Sampling One way to broaden total coverage of the 
assessment without increases in testing time per student is to administer only 
a sample of the assessments to a student by breaking the assessment into 
several pieces and giving each student only one piece of it. However, 
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